Model Selection

Image-Text Fusion

# Image-Text Fusion

Meta Llama Llama 4 Maverick 17B 128E Instruct

Llama 4 Maverick is a multimodal AI model released by Meta, supporting text and image understanding. It adopts a Mixture of Experts (MoE) architecture and excels in multilingual text and code generation tasks.

Multimodal Fusion

Transformers Supports Multiple Languages

Liquid is an autoregressive generation paradigm that achieves seamless fusion of visual understanding and generation by tokenizing images into discrete codes and learning these code embeddings alongside text tokens in a shared feature space.

Transformers English

Pixtral Large Instruct 2411

Pixtral-Large-Instruct-2411 is a multimodal instruction fine-tuned model based on MistralAI technology, supporting image and text input with multilingual processing capabilities.

Transformers Supports Multiple Languages

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase